Spikes as regularizers

نویسنده

  • Anders Søgaard
چکیده

We present a confidence-based single-layer feed-forward learning algorithm SPIRAL (Spike Regularized Adaptive Learning) relying on an encoding of activation spikes. We adaptively update a weight vector relying on confidence estimates and activation offsets relative to previous activity. We regularize updates proportionally to item-level confidence and weight-specific support, loosely inspired by the observation from neurophysiology that high spike rates are sometimes accompanied by low temporal precision. Our experiments suggest that the new learning algorithm SPIRAL is more robust and less prone to overfitting than both the averaged perceptron and AROW. 1 Confidence-weighted Learning of Linear Classifiers The perceptron [Rosenblatt, 1958] is a conceptually simple and widely used discriminative and linear classification algorithm. It was originally motivated by observations of how signals are passed between neurons in the brain. We will return to the perceptron as a model of neural computation, but from a more technical point of view, the main weakness of the perceptron as a linear classifier is that it is prone to overfitting. One particular type of overfitting that is likely to happen in perceptron learning is feature swamping [Sutton et al., 2006], i.e., that very frequent features may prevent co-variant features from being updated, leading to catastrophic performance if the frequent features are absent or less frequent at test time. In other words, in the perceptron, as well as in passive-aggressive learning Crammer et al. [2006], parameters are only updated when features occur, and rare features therefore often receive inaccurate values. There are several ways to approach such overfitting, e.g., capping the model’s supremum norm, but here we focus on a specific line of research: confidence-weighted learning of linear classifiers. Confidence-weighted learning explicitly estimates confidence during induction, often by maintaining Gaussian distributions over parameter vectors. In other words, each model parameter is interpreted as a mean, and augmented with a covariance estimate. Confidence-Weighted Learning CWL [Dredze et al., 2008] was the first learning algorithm to do this, but Crammer et al. [2009] later introduced Adaptive Regularization of Weight Vectors (AROW), which is a simpler and more effective alternative: AROW passes over the data, item by item, computing a margin, i.e., a dot product of a weight vector μ and the item, and updating μ and a covariance matrix Σ in a standard additive fashion. As in CWL, the weights – which are interpreted as means – and the covariance matrix form a Gaussian distribution over the weight vectors. Specifically, the confidence is x>Σx. We add a smoothing constant r(= 0.1) and compute the learning rate α adaptively: ∗This research is funded by the ERC Starting Grant LOWLANDS No. 313695, as well as by the Danish Research Council. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. ar X iv :1 61 1. 06 24 5v 1 [ cs .N E ] 1 8 N ov 2 01 6 α = max(0, 1− yx>μ) x>Σx + r (1) We then update μ proportionally to α, and update the covariance matrix as follows: Σ← Σ− Σxx >Σ x>Σx + r (2) CWL and AROW have been shown to be more robust than the (averaged) perceptron in several studies [Crammer et al., 2012, Søgaard and Johannsen, 2012], but below we show that replacing binary activations with samples from spikes can lead to better regularized and more robust models. 2 Spikes as Regularizers 2.1 Neurophysiological motivation Neurons do not fire synchronously at a constant rate. Neural signals are spike-shaped with an onset, an increase in signal, followed by a spike and a decrease in signal, and with an inhibition of the neuron before returning to its equilibrium. Below we simplify the picture a bit by assuming that spikes are bell-shaped (Gaussians). The learning algorithm (SPIRAL) which we will propose below, is motivated by the observation that spike rate (the speed at which a neuron fires) increases the more a neuron fires [Kawai and Sterling, 2002, Keller and Takahashi, 2015]. Futhermore, Keller and Takahashi [2015] show that increased activity may lead to spiking at higher rates with lower temporal precision. This means that the more active neurons are less successful in passing on signals, leading the neuron to return to a more stable firing rate. In other words, the brain performs implicit regularization by exhibiting low temporal precision at high spike rates. This prevents highly active neurons from swamping other co-variant, but less active neurons. We hypothesise that implementing a similar mechanism in our learning algorithms will prevent feature swamping in a similar fashion. Finally, Blanco et al. [2015] show that periods of increased spike rate lead to a smaller standard deviation in the synaptic weights. This loosely inspired us to implement the temporal imprecision at high spike rates by decreasing the weight’s standard deviation. 2.2 The algorithm In a single layer feedforward model, such as the perceptron, sampling from Gaussian spikes only effect the input, and we can therefore implement our regularizer as noise injection [Bishop, 1995]. The variance is the relative confidence of the model on the input item (same for all parameters), and the means are the parameter values. We multiply the input by the inverse of the sample, reflecting the intuition that highly active neurons are less precise and more likely to drop out, before we clip the sample from 0 to 1. We give the pseudocode in Algorithm 1, following the conventions in Crammer et al. [2009]. 3 Experiments 3.1 Main experiments We extract 10 binary classification problems from MNIST, training on odd data points, testing on even ones. Since our algorithm is parameter-free, we did not do explicit parameter tuning, but during the implementation of SPIRAL, we only experiment with the first of these ten problems (left, upper corner). To test the robustness of SPIRAL relatively to the perceptron and AROW, we randomly corrupt the input at test time by removing features. Our set-up is inspired by Globerson and Roweis [2006]. In the plots in Figure 2, the x-axis presents the number of features kept (not deleted). We observe two tendencies in the results: (i) SPIRAL outperforms the perceptron consistently with up to 80% of the features, and sometimes by a very large margin; except that in 2/10 cases, the perceptron

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Structured Sparsity in Text Categorization

We introduce three linguistically motivated structured regularizers based on parse trees, topics, and hierarchical word clusters for text categorization. These regularizers impose linguistic bias in feature weights, enabling us to incorporate prior knowledge into conventional bagof-words models. We show that our structured regularizers consistently improve classification accuracies compared to ...

متن کامل

Extending the quadratic taxonomy of regularizers for nonparametric registration

Quadratic regularizers are used in nonparametric registration to ensure that the registration problem is well posed and to yield solutions that exhibit certain types of smoothness. Examples of popular quadratic regularizers include the diffusion, elastic, fluid, and curvature regularizers. Two important features of these regularizers are whether they account for coupling of the spatial componen...

متن کامل

Smoothing Regularizers for Projective Basis Function Networks

Smoothing regularizers for radial basis functions have been studied extensively, but no general smoothing regularizers for projective basis functions (PBFs), such as the widely-used sigmoidal PBFs, have heretofore been proposed. We derive new classes of algebraically-simplemth-order smoothing regularizers for networks of projective basis functions f(W;x) = PNj=1 ujg xTvj + vj0 + u0; with genera...

متن کامل

High-dimensional Inference via Lipschitz Sparsity-Yielding Regularizers

Non-convex regularizers are more and more applied to high-dimensional inference with sparsity prior knowledge. In general, the nonconvex regularizer is superior to the convex ones in inference but it suffers the difficulties brought by local optimums and massive computation. A ”good” regularizer should perform well in both inference and optimization. In this paper, we prove that some non-convex...

متن کامل

Convex Relaxation of Vectorial Problems with Coupled Regularization

We propose convex relaxations for nonconvex energies on vector-valued functions which are tractable yet as tight as possible. In contrast to existing relaxations, we can handle the combination of nonconvex data terms with coupled regularizers such as l-regularizers. The key idea is to consider a collection of hypersurfaces with a relaxation that takes into account the entire functional rather t...

متن کامل

Primal-Dual convex optimization in large deformation diffeomorphic registration with robust regularizers

This paper proposes a method for primal-dual convex optimization in variational Large Deformation Diffeomorphic Metric Mapping (LDDMM) problems formulated with robust regularizers and image similarity metrics. The method is based on Chambolle and Pock primal-dual algorithm for solving general convex optimization problems. Diagonal preconditioning is used to ensure the convergence of the algorit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1611.06245  شماره 

صفحات  -

تاریخ انتشار 2016